Storage and Archive Manager File System - SAM-FS

The Storage and Archive Manager (SAM-FS) is a versatile software package that provides cost-effective storage management, archival and retrieval services. SAM-FS automatically and transparently copies files from expensive on-line disk to less expensive automated storage media, and restores the files back on-line as needed. SAM-FS automatically manages available disk capacities at specified thresholds by clearing disk space of copied data.

SAM-FS Capabilities

SAM-FS runs on a Solaris 2.x or later platform. It presents a transparent interface for the users and provides the following capabilities:

  • Archiving - automatically copies files from disk to archival (tape/optical) media.
  • Releasing - automatically maintains disk space at specified thresholds by clearing files that have been archived.
  • Staging - automatically copies files from archival media to disk.
  • Recycling - clears expired archive images from archival media.

    Archiving with SAM-FS is different from the normal archiving process which removes both the data and its pointers (metadata) from on-line storage. With SAM-FS, the archiving operation copies the data from on-line disk storage to one or more removable media volumes, leaving the original data and its metadata also on disk storage.

    Releasing frees up on-line disk space by removing data that has been archived. Although the data has been removed, the metadata remains on-line. To the user the data appears to remain on-line.

    When a released file is accessed, SAM-FS automatically stages, or restores, the file to disk cache. For a sequential read of an off-line file, the read operation tracks directly behind the staging operation, enabling the user to start working with the file before the entire file is staged.

    As users modify files, archive copies associated with old versions should be purged. The Recycler identifies removable media volumes with a large proportion of expired archive copies, and rearchives the useful data to different volumes. Once the useful data has been rearchived, the volume can be relabeled for reuse, or, if an historical record of file changes is required, the recycled volume can be moved to long-term off-site storage.


    An Unlimited File System

    SAM-FS allows for virtually unlimited file system sizes and number of files in each file system. In practice it is limited only by the number of disk partitions that are configured. A SAM-FS file system can expand beyond the traditional limits by allowing a file system to span multiple partitions/disks, and by allowing the file inode information to dynamically grow.

    A UNIX file systems (UFS) may not normally span more than one physical device. SAM-FS enables file system data to be written across multiple disk partitions through software disk striping.

    Another limit associated with a traditional UNIX file system is the total number of files that can be cataloged in the file system. Since the file information (the inodes) are dynamically allocated, the number of files is dictated by the amount of mass storage.


    Device Management

    Device Configuration

    SAM-FS supports optical robots, tape robots, optical drives, tape drives, magnetic disks and disk arrays. A Master Configuration File specifies the hardware devices to use as disk cache and archive storage devices, and associates robot devices, their robot and disk partitions with their file system.

    Storage Family Sets

    A SAM-FS file system can span several partitions/disks. The collection of partitions that make up a file system is known as a storage family set. During initialization of the file system, each partition in the storage family set is labeled to identify the family set to which it belongs.

    When a file system is mounted, the storage family set is specified as the mount device, not the disk partitions.

    Disk Storage Allocation

    SAM-FS uses a patented dual storage allocation unit (DAU) mechanism to address the trade-off between good storage utilization and good I/O performance.
    1. A small DAU performs well in a small-file operating environment. Storage utilization is very good, but file fragmentation is high, resulting in very poor access performance for large files (as the file is read, the potential for head repositioning is high, resulting in a degraded transfer rate).
    2. A large DAU performs well in a large-file operating environment with good access performance but poor disk utilization for small files.
    SAM-FS uses two DAU sizes to allow good access performance for large files without incurring poor storage utilization for small files. The small DAU is 4096 bytes and the large DAU is 16384 bytes. SAM-FS first maps the file to small DAUs and, if large enough, maps the rest of the file to large DAUs. The result is better performance along with more complete usage of the magnetic disk.

    To ensure that blocks allocated to a given file are in close proximity of each other, forward allocation is used. When blocks are assigned to a given file it is very probable that the small blocks will be contiguous in the cylinder. Since forward allocation applies to all blocks allocated, the next block will have a high probability of being near the current block. This prevents the overall scattering of a given file's data blocks which results from the immediate use of blocks from deleted files. With forward allocation, blocks from deleted files will be eventually reassigned as a group to new files.

    Disk Striping

    A SAM-FS file system can span multiple partitions by use of software disk striping to improve access to large files. When a file is created, the physical blocks assigned may exist on any disk within the storage family set associated with the file system. It is not a requirement for striped files to exist within a rigid disk assignment order as it is in most striping implementations.

    SAM-FS operates as a file system under Solaris 2, and can utilize the mirroring capability provided through Sun's DiskSuite.


    Removable Media

    Removable Media Recording and Interchange

    SAM-FS is designed to process large numbers of varying types of removable media.

    SAM-FS labels all removable media with ANSI standard volume labels. Each piece of media must be labeled with a unique volume serial name (VSN). SAM-FS writes all archived data using tar format. The use of tar format preserves the original filename, owner and group in effect at the time the file was archived.

    SAM-FS supports the standard UNIX file types of regular file, directory file and symbolic link. In addition, SAM-FS has a new file type, designated as a removable media file, which allows users to access data stored on removable media without knowing its physical location. A removable media file contains the access and position information that identifies the media and file resident on that volume.

    When a user opens a removable media file, SAM-FS requests that the media be mounted if it is not already mounted. Information written to or read from the file is transparently transferred to or from the appropriate physical device.


    Automated Storage Management

    The automated storage management facility of SAM-FS is responsible for managing on-line storage usage of the disk cache and removable media. SAM-FS provides:

    SAM-FS automated storage management is accomplished by four operations: Archiving, Releasing, Staging and Recycling.

    Archiving

    Placing a file in a SAM-FS file system triggers the archiving operation. The Archiver periodically scans each SAM-FS file system, examining the status of the files. SAM-FS automatically makes one archive copy of the file on removable media. To provide additional protection against damage or loss of data, the system administrator can choose to make up to four archive copies simultaneously on a variety of media.

    To ensure that files are complete before archiving, the Archiver allows files to age for a period of time (archive age) before archiving the file.

    SAM-FS provides for archive sets, which are groupings of files that match criteria such as minimum size, maximum size, owner, group or directory location. Each archive set is associated with a collection of removable media. Archive sets control the destination of the archive copy, wait time before the file is archived, and the length of time to retain the archive copy.

    In addition to the default archive time set by the system administrator, the user has the option to:

    1. Archive a file immediately.
    2. Never archive a file.
    The Archiver consists of three programs: archiver, arfind and arcopy. archiver is responsible for scheduling the archiving activity. arfind assigns files to be archived to archive sets. arcopy copies the files to be archived to the selected media.

    Releasing

    A high and low threshold are used to manage the usage of the disk cache. When disk space exceeds the high threshold, archived files are automatically released until the available disk space reaches the low threshold.

    The user has the option to:

    1. Release a file immediately after it is archived.
    2. Never release the file.
    3. Partial release, which retains the first portion of the file on disk cache to avoid unnecessary staging by applications which read only the beginning of the file.
    It is possible to fill up the disk cache because writing to magnetic disk occurs at a much greater rate than writing to removable media. If the disk cache is full, writes are suspended until files are archived and disk space is made available by the releaser.

    Staging

    Staging is automatically done by default when an off-line file is accessed. For a sequential read of a near-line file, the read operation tracks along directly behind the staging operation. This means that the stage does not have to be complete before the system begins returning data to the user. SAM-FS responds to the user's requiest when that portion of the file needed to satisfy the request has been staged back on line.

    The user is given the flexibility to:

    1. Stage the file immediately: There is an option to wait or not wait for the stage to complete.
    2. Never stage the file: Some applications randomly access small records from many large files. With the never stage option, the data is accessed directly from the archive media without staging the file on-line.
    3. Stage all the files in a directory when only one file is accessed.
    Associative StagingTM -- Associative Staging is an attribute of SAM-FS that can be assigned to a file or a directory. When the attribute is enabled, accessing, and thereby staging a file, causes every file in the requested file's directory (that also has the attribute enabled) to be staged as well.

    Staging of the initial file proceeds normally, however, the other files with the Associative Staging attribute enabled are also staged; therefore, when the user requests them, they are immediately available. Associative Staging not only reduces manual intervention by the user and speeds the access to related files, it also significantly reduces robot motion and media shuffling.

    Direct Access and Pre-Staging -- To facilitate the efficient use of on-line storage and provide quick access to near-line data, a file can be marked with the "never stage" attribute. This means the file will be accessed directly from the removable media. No stage is done and the file remains off-line. Direct access allows large near-line databases to be efficiently accessed. Direct access is supported on files which are resident on tape or optical disk.

    For applications which need to access large portions of the data, the file can be pre-staged to disk. Although not a requirement on the part of the user, pre-staging may provide improved usage of device resources by allowing stage requests for files resident on the same archive media to be batched together.

    Archiving vs Backup

    The differences between archiving and backing up are:

    1. After SAM-FS has archived a file, its alternate copies exist for the life of the file and need not be stored on any other medium, including on-line storage.
    2. The archive mechanism of SAM-FS provides immediate amailablitity of an archived file. Through SAM-FS the operating system views off-line archive storage as an addressable extension of the primary on-line disk storage.
    Archiving provides control over data storage costs and vulnerability. Archived data residing in off-line storage no longer needs to remain on-line. Files archived by the system can be considered backed up. Short-term and often-referenced data can be stored on-line on disk, and long-term data stored either on tape or optical disk.

    Backup systems make a snapshot of the current state of the file system. Recovery of a file (usually due to loss) involves an extraction process which copies the file from the backup media on to on-line storage. SAM-FS provides a backup utility samfsdump for backing up metadata.


    Data Integrity

    Protecting Data

    SAM-FS provides a framework to help protect data against accidental loss: